There is a study on swine vaccination and the project collaborators need to produce an inactivated vaccine to immunize pigs. Prior to do so, the pigs have to be challenged with a virus strain in order to test the efficacy of experimental inactivated vaccine.
Taking the previous published swine DNA vaccine as reference, the collaborator intended to know which circulating swine influenza A virus (IAV) is suitable to serve as the challenge strain.
To screen H1N1 IAV sequences (provided by the collaborator) against the epitopes in the DNA vaccine to find good matches using two approaches: Epitope Content Comparison (EpiCC) and JanusMatrix (JMX).
Rank the viruses to select top matches to be candidate for the challenge strain.
library(tidyverse)
library(gridExtra)
library(ggplot2)
library(plotly)
library(knitr)
Epitope Content Comparison (EpiCC) - A web-based computational method that facilitates pairwise comparison of protein sequences based on immunological property, i.e. T cell epitope content, rather than sequence identity, and evaluated its ability to classify swine influenza A virus (IAV) strain relatedness to estimate cross-protective potential of a vaccine strain for circulating viruses (Gutiérrez et. al, 2017).
To identify strains that have highest epitope content relatedness to the reference vaccine strain of MHC Class I/II (VACCINE_EPITOPES_CLASSI-NTC8684-ERNA41H-7SOJI/VACCINE_EPITOPES_CLASSII-NTC8682-ERNA41H-1SOJII) across H1N1 Swine IAV whole genome, i.e. a total of 8 protein segments/antigens.
Output from EpiCC tool to process and analyze: 8 outfiles for MHC Class I and II respectively.
head(colnames(epicc_data_1_clsI))
## [1] "id"
## [2] "GB_KY888027-A_SWINE_KANSAS_A01378019_2017-SEGMENT_1"
## [3] "GB_KY970162-A_SWINE_MICHIGAN_A01259076_2017-SEGMENT_1"
## [4] "GB_MF116355-A_SWINE_KANSAS_A01378027_2017-SEGMENT_1"
## [5] "GB_MF373215-A_SWINE_IOWA_A01672518_2017-SEGMENT_1"
## [6] "GB_MF373233-A_SWINE_NEBRASKA_A01672345_2017-SEGMENT_1"
nrow(epicc_data_1_clsI)
## [1] 1
ncol(epicc_data_1_clsI)
## [1] 74
epicc_data_1_clsI_long <- epicc_data_1_clsI %>%
gather(key = strains, value = PB2_score,
`GB_KY888027-A_SWINE_KANSAS_A01378019_2017-SEGMENT_1`:`VACCINE_EPITOPES_CLASSI-NTC8684-ERNA41H-7SOJI`) %>%
select(strains,PB2_score) %>% mutate(header = strains) %>%
separate(header, into = c("ID", "Sequence", "Segment"), sep = "-", extra = "merge")
epicc_data_1_clsI_long$PB2_score <- as.numeric(epicc_data_1_clsI_long$PB2_score)
epicc_data_1_clsI_sort <- epicc_data_1_clsI_long %>% filter(!Sequence == "NTC8684") %>%
arrange(Sequence) %>% arrange(desc(PB2_score))
epicc_data_1_clsI_top10 <- head(epicc_data_1_clsI_sort, 10)
epicc_data_1_clsI_top10 <- epicc_data_1_clsI_top10 %>% select(Sequence, Segment)
epicc_data_1_clsI_top10
## Sequence Segment
## 1 A_SWINE_TEXAS_A02214607_2017 SEGMENT_1
## 2 A_SWINE_IOWA_A01667091_2017 SEGMENT_1
## 3 A_SWINE_OKLAHOMA_A02214419_2017 SEGMENT_1
## 4 A_SWINE_NORTH_CAROLINA_A01672751_2017 SEGMENT_1
## 5 A_SWINE_IOWA_A01104104_2017 SEGMENT_1
## 6 A_SWINE_NEBRASKA_A02216645_2017 SEGMENT_1
## 7 A_SWINE_KANSAS_A01378019_2017 SEGMENT_1
## 8 A_SWINE_KANSAS_A01378038_2017 SEGMENT_1
## 9 A_SWINE_IOWA_A01672518_2017 SEGMENT_1
## 10 A_SWINE_NORTH_CAROLINA_A01672011_2017 SEGMENT_1
head(epicc_data_clsI_top10_all)
## Sequence Segment
## 1 A_SWINE_TEXAS_A02214607_2017 SEGMENT_1
## 2 A_SWINE_IOWA_A01667091_2017 SEGMENT_1
## 3 A_SWINE_OKLAHOMA_A02214419_2017 SEGMENT_1
## 4 A_SWINE_NORTH_CAROLINA_A01672751_2017 SEGMENT_1
## 5 A_SWINE_IOWA_A01104104_2017 SEGMENT_1
## 6 A_SWINE_NEBRASKA_A02216645_2017 SEGMENT_1
tail(epicc_data_clsI_top10_all)
## Sequence Segment
## 75 A_SWINE_MINNESOTA_A02214666_2017 SEGMENT_8
## 76 A_SWINE_KANSAS_A01378027_2017 SEGMENT_8
## 77 A_SWINE_IOWA_A02215041_2017 SEGMENT_8
## 78 A_SWINE_IOWA_A02216456_2017 SEGMENT_8
## 79 A_SWINE_OHIO_A02219547_2017 SEGMENT_8
## 80 A_SWINE_INDIANA_A02216644_2017 SEGMENT_8
unique(epicc_data_clsI_top10_all$Sequence)
## [1] "A_SWINE_TEXAS_A02214607_2017"
## [2] "A_SWINE_IOWA_A01667091_2017"
## [3] "A_SWINE_OKLAHOMA_A02214419_2017"
## [4] "A_SWINE_NORTH_CAROLINA_A01672751_2017"
## [5] "A_SWINE_IOWA_A01104104_2017"
## [6] "A_SWINE_NEBRASKA_A02216645_2017"
## [7] "A_SWINE_KANSAS_A01378019_2017"
## [8] "A_SWINE_KANSAS_A01378038_2017"
## [9] "A_SWINE_IOWA_A01672518_2017"
## [10] "A_SWINE_NORTH_CAROLINA_A01672011_2017"
## [11] "A_SWINE_NEBRASKA_A02219793_2017"
## [12] "A_SWINE_INDIANA_A01672825_2017"
## [13] "A_SWINE_ILLINOIS_A02218178_2017"
## [14] "A_SWINE_IOWA_A02216046_2017"
## [15] "A_SWINE_ILLINOIS_A01932036_2017"
## [16] "A_SWINE_ILLINOIS_A01672343_2017"
## [17] "A_SWINE_ILLINOIS_A02214663_2017"
## [18] "A_SWINE_IOWA_A02214835_2017"
## [19] "A_SWINE_MINNESOTA_A02214666_2017"
## [20] "A_SWINE_IOWA_A02215202_2017"
## [21] "A_SWINE_KANSAS_A01378027_2017"
## [22] "A_SWINE_IOWA_A01667089_2017"
## [23] "A_SWINE_IOWA_A02221505_2017"
## [24] "A_SWINE_IOWA_A02217282_2017"
## [25] "A_SWINE_IOWA_A02215038_2017"
## [26] "A_SWINE_ILLINOIS_A02219783_2017"
## [27] "A_SWINE_OHIO_17TOSU1384_2017"
## [28] "A_SWINE_OHIO_17TOSU1386_2017"
## [29] "A_SWINE_OHIO_A01354304_2017"
## [30] "A_SWINE_OHIO_A01354305_2017"
## [31] "A_SWINE_OKLAHOMA_A01672680_2017"
## [32] "A_SWINE_MISSOURI_A01932424_2017"
## [33] "A_SWINE_NORTH_CAROLINA_A01785281_2017"
## [34] "A_SWINE_KANSAS_A01378037_2017"
## [35] "A_SWINE_IOWA_A02215041_2017"
## [36] "A_SWINE_IOWA_A02216456_2017"
## [37] "A_SWINE_OHIO_A02219547_2017"
## [38] "A_SWINE_INDIANA_A02216644_2017"
unique(epicc_data_clsI_top10_all$Segment)
## [1] "SEGMENT_1" "SEGMENT_2" "SEGMENT_3" "SEGMENT_4" "SEGMENT_5" "SEGMENT_6"
## [7] "SEGMENT_7" "SEGMENT_8"
The numbering in each segment stands for type of protein encoded in flu genome. 1:PB2, 2: PB1, 3:PA, 4:HA, 5:NP, 6:NA, 7:M, 8:NS
most_occurence_strain_clsI <- epicc_data_clsI_top10_all %>% group_by(Sequence) %>% summarise(count = n()) %>% arrange(desc(count)) %>% filter(count >= 3)
max(most_occurence_strain_clsI$count)
## [1] 5
most_occurence_strain_clsI
## # A tibble: 12 x 2
## Sequence count
## <chr> <int>
## 1 A_SWINE_IOWA_A01672518_2017 5
## 2 A_SWINE_IOWA_A02215202_2017 5
## 3 A_SWINE_IOWA_A02221505_2017 5
## 4 A_SWINE_IOWA_A01104104_2017 4
## 5 A_SWINE_IOWA_A01667091_2017 4
## 6 A_SWINE_IOWA_A02214835_2017 4
## 7 A_SWINE_IOWA_A02215038_2017 4
## 8 A_SWINE_KANSAS_A01378027_2017 4
## 9 A_SWINE_TEXAS_A02214607_2017 4
## 10 A_SWINE_ILLINOIS_A01932036_2017 3
## 11 A_SWINE_KANSAS_A01378038_2017 3
## 12 A_SWINE_NEBRASKA_A02216645_2017 3
Maximum frequency = 5 out of 8 (if the EpiCC score of a strain remain in the top 10 list of all proteins, then it will have frequency of 8). In this case, since the maximum frequency is only 5, this means that these strains are only found in the top 10 list of 5 proteins. Here, we considered strains that of frequency between 3 - 5.
head(epicc_data_clsII_top10_all)
## Sequence Segment
## 1 A_SWINE_KANSAS_A01378019_2017 SEGMENT_1
## 2 A_SWINE_MISSOURI_A01932424_2017 SEGMENT_1
## 3 A_SWINE_KANSAS_A01378027_2017 SEGMENT_1
## 4 A_SWINE_OHIO_A02219547_2017 SEGMENT_1
## 5 A_SWINE_NEBRASKA_A02219793_2017 SEGMENT_1
## 6 A_SWINE_OKLAHOMA_A02214419_2017 SEGMENT_1
tail(epicc_data_clsII_top10_all)
## Sequence Segment
## 75 A_SWINE_KANSAS_A01378027_2017 SEGMENT_8
## 76 A_SWINE_NEBRASKA_A02219793_2017 SEGMENT_8
## 77 A_SWINE_NORTH_CAROLINA_A01785281_2017 SEGMENT_8
## 78 A_SWINE_KANSAS_A01378019_2017 SEGMENT_8
## 79 A_SWINE_MISSOURI_A02216048_2017 SEGMENT_8
## 80 A_SWINE_IOWA_A02221506_2017 SEGMENT_8
unique(epicc_data_clsII_top10_all$Sequence)
## [1] "A_SWINE_KANSAS_A01378019_2017"
## [2] "A_SWINE_MISSOURI_A01932424_2017"
## [3] "A_SWINE_KANSAS_A01378027_2017"
## [4] "A_SWINE_OHIO_A02219547_2017"
## [5] "A_SWINE_NEBRASKA_A02219793_2017"
## [6] "A_SWINE_OKLAHOMA_A02214419_2017"
## [7] "A_SWINE_IOWA_A01672342_2017"
## [8] "A_SWINE_KANSAS_A01378038_2017"
## [9] "A_SWINE_ILLINOIS_A02218178_2017"
## [10] "A_SWINE_IOWA_A01667088_2017"
## [11] "A_SWINE_NORTH_CAROLINA_A01672751_2017"
## [12] "A_SWINE_NORTH_CAROLINA_A01785281_2017"
## [13] "A_SWINE_IOWA_A01672824_2017"
## [14] "A_SWINE_IOWA_A02214479_2017"
## [15] "A_SWINE_MISSOURI_A02214279_2017"
## [16] "A_SWINE_NORTH_CAROLINA_A01785282_2017"
## [17] "A_SWINE_ILLINOIS_A01932036_2017"
## [18] "A_SWINE_MISSOURI_A02216048_2017"
## [19] "A_SWINE_IOWA_A02221508_2017"
## [20] "A_SWINE_OKLAHOMA_A01672680_2017"
## [21] "A_SWINE_NORTH_CAROLINA_A01672011_2017"
## [22] "A_SWINE_IOWA_A02215038_2017"
## [23] "A_SWINE_ARKANSAS_A02218161_2017"
## [24] "A_SWINE_TEXAS_A02214607_2017"
## [25] "A_SWINE_IOWA_A02215202_2017"
## [26] "A_SWINE_IOWA_A01104104_2017"
## [27] "A_SWINE_IOWA_A02214835_2017"
## [28] "A_SWINE_NEBRASKA_A02216645_2017"
## [29] "A_SWINE_IOWA_A01672518_2017"
## [30] "A_SWINE_IOWA_A02221505_2017"
## [31] "A_SWINE_IOWA_A02217282_2017"
## [32] "A_SWINE_KANSAS_A01378037_2017"
## [33] "A_SWINE_MINNESOTA_A02214666_2017"
## [34] "A_SWINE_NEBRASKA_A01672345_2017"
## [35] "A_SWINE_ILLINOIS_A02219783_2017"
## [36] "A_SWINE_IOWA_A02221506_2017"
unique(epicc_data_clsII_top10_all$Segment)
## [1] "SEGMENT_1" "SEGMENT_2" "SEGMENT_3" "SEGMENT_4" "SEGMENT_5" "SEGMENT_6"
## [7] "SEGMENT_7" "SEGMENT_8"
most_occurence_strain_clsII <- epicc_data_clsII_top10_all %>% group_by(Sequence) %>% summarise(count = n()) %>%
arrange(desc(count)) %>% filter(count >= 3)
max(most_occurence_strain_clsII$count)
## [1] 7
most_occurence_strain_clsII
## # A tibble: 14 x 2
## Sequence count
## <chr> <int>
## 1 A_SWINE_KANSAS_A01378027_2017 7
## 2 A_SWINE_ILLINOIS_A01932036_2017 6
## 3 A_SWINE_IOWA_A02215038_2017 4
## 4 A_SWINE_KANSAS_A01378038_2017 4
## 5 A_SWINE_NORTH_CAROLINA_A01785281_2017 4
## 6 A_SWINE_OKLAHOMA_A02214419_2017 4
## 7 A_SWINE_IOWA_A01104104_2017 3
## 8 A_SWINE_IOWA_A01672518_2017 3
## 9 A_SWINE_IOWA_A02215202_2017 3
## 10 A_SWINE_KANSAS_A01378019_2017 3
## 11 A_SWINE_KANSAS_A01378037_2017 3
## 12 A_SWINE_MISSOURI_A02216048_2017 3
## 13 A_SWINE_NEBRASKA_A02216645_2017 3
## 14 A_SWINE_OKLAHOMA_A01672680_2017 3
Maximum frequency = 7 out of 8. Same selection criteria as Class I data, we considered strains that of frequency starting 3.
Figure 1.2 | Figure shows strains that are found in the top 10 list of every proteins and their frequencies were counted. This is to identify strains that are constantly having top EpiCC score across the whole genome. A reference line is drawn and strains are shortlisted based on the cut off point.
epicc_overlap <- most_occurence_strain_clsII$Sequence %in% most_occurence_strain_clsI$Sequence
epicc_common <- most_occurence_strain_clsII[epicc_overlap,1]
epicc_common
## # A tibble: 8 x 1
## Sequence
## <chr>
## 1 A_SWINE_KANSAS_A01378027_2017
## 2 A_SWINE_ILLINOIS_A01932036_2017
## 3 A_SWINE_IOWA_A02215038_2017
## 4 A_SWINE_KANSAS_A01378038_2017
## 5 A_SWINE_IOWA_A01104104_2017
## 6 A_SWINE_IOWA_A01672518_2017
## 7 A_SWINE_IOWA_A02215202_2017
## 8 A_SWINE_NEBRASKA_A02216645_2017
There are 8 Swine IAV found in both Class I and Class II top EpiCC score list. This means 8 of these strains are having relatively high epitope content to the reference vaccine strain compared to other H1N1 Swine IAV sequences.
JanusMatrix (JMX) or it is also called Janus Immunogenicity Score (JIS) - A web-based tool that incorporated a well-established method for MHC (major histocompatibility complex) binding prediction, with a novel assessment of the potential for T cell receptor (TCR) binding based on similarity with self. This means both good MHC binding and poor self-similarity are required for high immunogenicity, i.e. a robust T effector response (He et. al, 2013).
In this case, we are not looking for self-similarity epitopes but to identify strains that have the most epitopes coverage found in the swine DNA vaccine sequences of MHC Class I / II. From EpiCC analysis, it can only tells us how much relatedness (top EpiCC score) in terms of epitope content but we would not know what epitope sequences are in the content, and JMX is able to.
Input to JMX tool: DNA vaccine sequences of MHC ClassI / II were used to query against a set of database that comprised of 72 H1N1 Swine IAV sequences.
Output from JMX tool to process and analyze: 8 HTML outfiles for MHC Class I and II respectively.
colnames(jmx_data_2_clsI_strain)
## [1] "Filename" "Sequence" "Segment" "Epitope"
head(jmx_data_2_clsI_strain[,2:4])
## # A tibble: 6 x 3
## Sequence Segment Epitope
## <chr> <chr> <chr>
## 1 A_SWINE_KANSAS_A01378019_2017 SEGMENT_2 DTVNRTHQY
## 2 A_SWINE_MICHIGAN_A01259076_2017 SEGMENT_2 DTVNRTHQY
## 3 A_SWINE_KANSAS_A01378027_2017 SEGMENT_2 DTVNRTHQY
## 4 A_SWINE_IOWA_A01672518_2017 SEGMENT_2 DTVNRTHQY
## 5 A_SWINE_NEBRASKA_A01672345_2017 SEGMENT_2 DTVNRTHQY
## 6 A_SWINE_MINNESOTA_A01672344_2017 SEGMENT_2 DTVNRTHQY
colnames(jmx_data_all_clsI_matrix)
## [1] "Sequence" "Segment" "Epitope" "Count" "Presence"
head(jmx_data_all_clsI_matrix)
## # A tibble: 6 x 5
## # Groups: Sequence, Segment [2]
## Sequence Segment Epitope Count Presence
## <chr> <chr> <chr> <int> <dbl>
## 1 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_2 DTVNRTHQY 1 1
## 2 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_4 GMIDGWYGY 2 1
## 3 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_4 NADTLCIGY 1 1
## 4 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_4 RIYQILAIY 1 1
## 5 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_4 SVKNGTYDY 1 1
## 6 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_4 TSADQQSLY 1 1
Figure 2.1 | H1N1 Swine IAV strains are plotted against epitopes that found in the DNA vaccine. Blue spots indicate the presence of the epitope, wherease blank spots indicate the opposite. Horizontal view will tell which epitopes are presence and how conserved they are across the strains, while vertical view shows the number of epitopes found in a particular IAV strain.
n_occur
## Var1 Freq
## 2 A_SWINE_ILLINOIS_A01672343_2017 16
## 4 A_SWINE_ILLINOIS_A02214663_2017 16
## 6 A_SWINE_ILLINOIS_A02218178_2017 16
## 7 A_SWINE_ILLINOIS_A02219783_2017 16
## 8 A_SWINE_INDIANA_A01672825_2017 16
## 13 A_SWINE_IOWA_A01667088_2017 16
## 17 A_SWINE_IOWA_A01672415_2017 16
## 27 A_SWINE_IOWA_A02216046_2017 16
## 30 A_SWINE_IOWA_A02217313_2017 16
## 35 A_SWINE_IOWA_A02221506_2017 16
## 37 A_SWINE_KANSAS_A01378019_2017 16
## 39 A_SWINE_KANSAS_A01378037_2017 16
## 40 A_SWINE_KANSAS_A01378038_2017 16
## 41 A_SWINE_MICHIGAN_A01259076_2017 16
## 42 A_SWINE_MICHIGAN_A02214235_2017 16
## 45 A_SWINE_MINNESOTA_A02214666_2017 16
## 52 A_SWINE_NEBRASKA_A01672345_2017 16
## 55 A_SWINE_NORTH_CAROLINA_A01672011_2017 16
## 58 A_SWINE_NORTH_CAROLINA_A01785282_2017 16
## 68 A_SWINE_OHIO_A02219547_2017 16
## 1 A_SWINE_ARKANSAS_A02218161_2017 15
## 5 A_SWINE_ILLINOIS_A02215204_2017 15
## 10 A_SWINE_INDIANA_A02216644_2017 15
## 11 A_SWINE_INDIANA_A02218180_2017 15
## 12 A_SWINE_IOWA_A01104104_2017 15
## 16 A_SWINE_IOWA_A01672342_2017 15
## 18 A_SWINE_IOWA_A01672518_2017 15
## 19 A_SWINE_IOWA_A01672824_2017 15
## 20 A_SWINE_IOWA_A01932420_2017 15
## 21 A_SWINE_IOWA_A02214479_2017 15
## 22 A_SWINE_IOWA_A02214835_2017 15
## 23 A_SWINE_IOWA_A02215038_2017 15
## 25 A_SWINE_IOWA_A02215202_2017 15
## 26 A_SWINE_IOWA_A02216044_2017 15
## 29 A_SWINE_IOWA_A02217282_2017 15
## 31 A_SWINE_IOWA_A02218171_2017 15
## 32 A_SWINE_IOWA_A02218750_2017 15
## 34 A_SWINE_IOWA_A02221505_2017 15
## 36 A_SWINE_IOWA_A02221508_2017 15
## 38 A_SWINE_KANSAS_A01378027_2017 15
## 44 A_SWINE_MINNESOTA_A01672344_2017 15
## 46 A_SWINE_MINNESOTA_A02214846_2017 15
## 49 A_SWINE_MISSOURI_A02214279_2017 15
## 50 A_SWINE_MISSOURI_A02216048_2017 15
## 51 A_SWINE_MISSOURI_A02218334_2017 15
## 53 A_SWINE_NEBRASKA_A02216645_2017 15
## 54 A_SWINE_NEBRASKA_A02219793_2017 15
## 60 A_SWINE_OHIO_17TOSU1384_2017 15
## 61 A_SWINE_OHIO_17TOSU1386_2017 15
## 62 A_SWINE_OHIO_A01354304_2017 15
## 63 A_SWINE_OHIO_A01354305_2017 15
## 72 A_SWINE_WISCONSIN_A01104100_2017 15
## 3 A_SWINE_ILLINOIS_A01932036_2017 14
## 9 A_SWINE_INDIANA_A02214845_2017 14
## 24 A_SWINE_IOWA_A02215041_2017 14
## 28 A_SWINE_IOWA_A02216456_2017 14
## 33 A_SWINE_IOWA_A02218755_2017 14
## 43 A_SWINE_MINNESOTA_A01667100_2017 14
## 67 A_SWINE_OHIO_A02216472_2017 14
## 14 A_SWINE_IOWA_A01667089_2017 13
## 15 A_SWINE_IOWA_A01667091_2017 13
## 48 A_SWINE_MISSOURI_A01932424_2017 13
## 57 A_SWINE_NORTH_CAROLINA_A01785281_2017 13
## 59 A_SWINE_NORTH_CAROLINA_A02214775_2017 13
## 65 A_SWINE_OHIO_A02214848_2017 13
## 66 A_SWINE_OHIO_A02215367_2017 13
## 71 A_SWINE_TEXAS_A02214607_2017 13
## 47 A_SWINE_MISSOURI_A01672819_2017 12
## 56 A_SWINE_NORTH_CAROLINA_A01672751_2017 12
## 64 A_SWINE_OHIO_A02214229_2017 12
## 69 A_SWINE_OKLAHOMA_A01672680_2017 10
## 70 A_SWINE_OKLAHOMA_A02214419_2017 10
max(n_occur$Freq)
## [1] 16
min(n_occur$Freq)
## [1] 10
jmx_data_all_clsI_freq <- n_occur[n_occur$Freq >= 15,]
nrow(jmx_data_all_clsI_freq)
## [1] 52
jmx_data_all_clsI_freq
## Var1 Freq
## 2 A_SWINE_ILLINOIS_A01672343_2017 16
## 4 A_SWINE_ILLINOIS_A02214663_2017 16
## 6 A_SWINE_ILLINOIS_A02218178_2017 16
## 7 A_SWINE_ILLINOIS_A02219783_2017 16
## 8 A_SWINE_INDIANA_A01672825_2017 16
## 13 A_SWINE_IOWA_A01667088_2017 16
## 17 A_SWINE_IOWA_A01672415_2017 16
## 27 A_SWINE_IOWA_A02216046_2017 16
## 30 A_SWINE_IOWA_A02217313_2017 16
## 35 A_SWINE_IOWA_A02221506_2017 16
## 37 A_SWINE_KANSAS_A01378019_2017 16
## 39 A_SWINE_KANSAS_A01378037_2017 16
## 40 A_SWINE_KANSAS_A01378038_2017 16
## 41 A_SWINE_MICHIGAN_A01259076_2017 16
## 42 A_SWINE_MICHIGAN_A02214235_2017 16
## 45 A_SWINE_MINNESOTA_A02214666_2017 16
## 52 A_SWINE_NEBRASKA_A01672345_2017 16
## 55 A_SWINE_NORTH_CAROLINA_A01672011_2017 16
## 58 A_SWINE_NORTH_CAROLINA_A01785282_2017 16
## 68 A_SWINE_OHIO_A02219547_2017 16
## 1 A_SWINE_ARKANSAS_A02218161_2017 15
## 5 A_SWINE_ILLINOIS_A02215204_2017 15
## 10 A_SWINE_INDIANA_A02216644_2017 15
## 11 A_SWINE_INDIANA_A02218180_2017 15
## 12 A_SWINE_IOWA_A01104104_2017 15
## 16 A_SWINE_IOWA_A01672342_2017 15
## 18 A_SWINE_IOWA_A01672518_2017 15
## 19 A_SWINE_IOWA_A01672824_2017 15
## 20 A_SWINE_IOWA_A01932420_2017 15
## 21 A_SWINE_IOWA_A02214479_2017 15
## 22 A_SWINE_IOWA_A02214835_2017 15
## 23 A_SWINE_IOWA_A02215038_2017 15
## 25 A_SWINE_IOWA_A02215202_2017 15
## 26 A_SWINE_IOWA_A02216044_2017 15
## 29 A_SWINE_IOWA_A02217282_2017 15
## 31 A_SWINE_IOWA_A02218171_2017 15
## 32 A_SWINE_IOWA_A02218750_2017 15
## 34 A_SWINE_IOWA_A02221505_2017 15
## 36 A_SWINE_IOWA_A02221508_2017 15
## 38 A_SWINE_KANSAS_A01378027_2017 15
## 44 A_SWINE_MINNESOTA_A01672344_2017 15
## 46 A_SWINE_MINNESOTA_A02214846_2017 15
## 49 A_SWINE_MISSOURI_A02214279_2017 15
## 50 A_SWINE_MISSOURI_A02216048_2017 15
## 51 A_SWINE_MISSOURI_A02218334_2017 15
## 53 A_SWINE_NEBRASKA_A02216645_2017 15
## 54 A_SWINE_NEBRASKA_A02219793_2017 15
## 60 A_SWINE_OHIO_17TOSU1384_2017 15
## 61 A_SWINE_OHIO_17TOSU1386_2017 15
## 62 A_SWINE_OHIO_A01354304_2017 15
## 63 A_SWINE_OHIO_A01354305_2017 15
## 72 A_SWINE_WISCONSIN_A01104100_2017 15
head(jmx_data_2_clsII_strain[,2:5])
## # A tibble: 6 x 4
## Sequence Segment String Epitopes
## <chr> <chr> <chr> <chr>
## 1 A_SWINE_KANSAS_A01378019_2017 SEGMENT_2 1 MMGMFNMLS
## 2 A_SWINE_KANSAS_A01378019_2017 SEGMENT_2 2 RYGFVANFS
## 3 A_SWINE_KANSAS_A01378019_2017 SEGMENT_2 4 MFNMLSTVL
## 4 A_SWINE_KANSAS_A01378019_2017 SEGMENT_2 5 FVANFSMEL
## 5 A_SWINE_KANSAS_A01378019_2017 SEGMENT_2 5 FNMLSTVLG
## 6 A_SWINE_KANSAS_A01378019_2017 SEGMENT_2 8 LSTVLGVSI
colnames(jmx_data_all_clsII_matrix)
## [1] "Sequence" "Segment" "Epitopes" "Count"
head(jmx_data_all_clsII_matrix)
## # A tibble: 6 x 4
## # Groups: Sequence, Segment [1]
## Sequence Segment Epitopes Count
## <chr> <chr> <chr> <int>
## 1 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_2 FNMLSTVLG 8
## 2 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_2 FSMELPSFG 8
## 3 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_2 FVANFSMEL 8
## 4 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_2 LSTVLGVSI 8
## 5 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_2 MFNMLSTVL 8
## 6 A_SWINE_ARKANSAS_A02218161_2017 SEGMENT_2 MMGMFNMLS 8
n_occur_clsII
## Var1 Freq
## 12 A_SWINE_IOWA_A01104104_2017 649
## 18 A_SWINE_IOWA_A01672518_2017 649
## 25 A_SWINE_IOWA_A02215202_2017 649
## 53 A_SWINE_NEBRASKA_A02216645_2017 649
## 23 A_SWINE_IOWA_A02215038_2017 638
## 34 A_SWINE_IOWA_A02221505_2017 608
## 38 A_SWINE_KANSAS_A01378027_2017 600
## 69 A_SWINE_OKLAHOMA_A01672680_2017 563
## 55 A_SWINE_NORTH_CAROLINA_A01672011_2017 562
## 2 A_SWINE_ILLINOIS_A01672343_2017 559
## 4 A_SWINE_ILLINOIS_A02214663_2017 559
## 6 A_SWINE_ILLINOIS_A02218178_2017 559
## 7 A_SWINE_ILLINOIS_A02219783_2017 559
## 9 A_SWINE_INDIANA_A02214845_2017 559
## 13 A_SWINE_IOWA_A01667088_2017 559
## 17 A_SWINE_IOWA_A01672415_2017 559
## 27 A_SWINE_IOWA_A02216046_2017 559
## 30 A_SWINE_IOWA_A02217313_2017 559
## 33 A_SWINE_IOWA_A02218755_2017 559
## 35 A_SWINE_IOWA_A02221506_2017 559
## 41 A_SWINE_MICHIGAN_A01259076_2017 559
## 42 A_SWINE_MICHIGAN_A02214235_2017 559
## 45 A_SWINE_MINNESOTA_A02214666_2017 559
## 46 A_SWINE_MINNESOTA_A02214846_2017 559
## 52 A_SWINE_NEBRASKA_A01672345_2017 559
## 65 A_SWINE_OHIO_A02214848_2017 559
## 68 A_SWINE_OHIO_A02219547_2017 559
## 37 A_SWINE_KANSAS_A01378019_2017 556
## 8 A_SWINE_INDIANA_A01672825_2017 554
## 11 A_SWINE_INDIANA_A02218180_2017 548
## 60 A_SWINE_OHIO_17TOSU1384_2017 548
## 61 A_SWINE_OHIO_17TOSU1386_2017 548
## 62 A_SWINE_OHIO_A01354304_2017 548
## 63 A_SWINE_OHIO_A01354305_2017 548
## 72 A_SWINE_WISCONSIN_A01104100_2017 548
## 24 A_SWINE_IOWA_A02215041_2017 547
## 50 A_SWINE_MISSOURI_A02216048_2017 546
## 56 A_SWINE_NORTH_CAROLINA_A01672751_2017 543
## 57 A_SWINE_NORTH_CAROLINA_A01785281_2017 543
## 3 A_SWINE_ILLINOIS_A01932036_2017 541
## 5 A_SWINE_ILLINOIS_A02215204_2017 541
## 22 A_SWINE_IOWA_A02214835_2017 539
## 70 A_SWINE_OKLAHOMA_A02214419_2017 532
## 58 A_SWINE_NORTH_CAROLINA_A01785282_2017 528
## 43 A_SWINE_MINNESOTA_A01667100_2017 520
## 28 A_SWINE_IOWA_A02216456_2017 519
## 10 A_SWINE_INDIANA_A02216644_2017 518
## 64 A_SWINE_OHIO_A02214229_2017 518
## 66 A_SWINE_OHIO_A02215367_2017 518
## 54 A_SWINE_NEBRASKA_A02219793_2017 511
## 29 A_SWINE_IOWA_A02217282_2017 507
## 39 A_SWINE_KANSAS_A01378037_2017 507
## 40 A_SWINE_KANSAS_A01378038_2017 507
## 67 A_SWINE_OHIO_A02216472_2017 497
## 1 A_SWINE_ARKANSAS_A02218161_2017 480
## 20 A_SWINE_IOWA_A01932420_2017 480
## 31 A_SWINE_IOWA_A02218171_2017 480
## 36 A_SWINE_IOWA_A02221508_2017 480
## 44 A_SWINE_MINNESOTA_A01672344_2017 480
## 26 A_SWINE_IOWA_A02216044_2017 465
## 16 A_SWINE_IOWA_A01672342_2017 462
## 51 A_SWINE_MISSOURI_A02218334_2017 462
## 59 A_SWINE_NORTH_CAROLINA_A02214775_2017 446
## 14 A_SWINE_IOWA_A01667089_2017 440
## 19 A_SWINE_IOWA_A01672824_2017 438
## 21 A_SWINE_IOWA_A02214479_2017 438
## 49 A_SWINE_MISSOURI_A02214279_2017 438
## 15 A_SWINE_IOWA_A01667091_2017 423
## 48 A_SWINE_MISSOURI_A01932424_2017 420
## 71 A_SWINE_TEXAS_A02214607_2017 414
## 32 A_SWINE_IOWA_A02218750_2017 413
## 47 A_SWINE_MISSOURI_A01672819_2017 391
max(n_occur_clsII$Freq)
## [1] 649
min(n_occur_clsII$Freq)
## [1] 391
Figure 2.2 | The stacked bar chart shows the total epitopes (in each antigen) found in H1N1 Swine IAV strains. A reference line is drawn across the bar plot and strains that have total frequency equal or above the reference line will be considered.
jmx_data_all_clsII_freq <- n_occur_clsII[n_occur_clsII$Freq >= 555,]
nrow(jmx_data_all_clsII_freq)
## [1] 28
jmx_data_all_clsII_freq
## Var1 Freq
## 12 A_SWINE_IOWA_A01104104_2017 649
## 18 A_SWINE_IOWA_A01672518_2017 649
## 25 A_SWINE_IOWA_A02215202_2017 649
## 53 A_SWINE_NEBRASKA_A02216645_2017 649
## 23 A_SWINE_IOWA_A02215038_2017 638
## 34 A_SWINE_IOWA_A02221505_2017 608
## 38 A_SWINE_KANSAS_A01378027_2017 600
## 69 A_SWINE_OKLAHOMA_A01672680_2017 563
## 55 A_SWINE_NORTH_CAROLINA_A01672011_2017 562
## 2 A_SWINE_ILLINOIS_A01672343_2017 559
## 4 A_SWINE_ILLINOIS_A02214663_2017 559
## 6 A_SWINE_ILLINOIS_A02218178_2017 559
## 7 A_SWINE_ILLINOIS_A02219783_2017 559
## 9 A_SWINE_INDIANA_A02214845_2017 559
## 13 A_SWINE_IOWA_A01667088_2017 559
## 17 A_SWINE_IOWA_A01672415_2017 559
## 27 A_SWINE_IOWA_A02216046_2017 559
## 30 A_SWINE_IOWA_A02217313_2017 559
## 33 A_SWINE_IOWA_A02218755_2017 559
## 35 A_SWINE_IOWA_A02221506_2017 559
## 41 A_SWINE_MICHIGAN_A01259076_2017 559
## 42 A_SWINE_MICHIGAN_A02214235_2017 559
## 45 A_SWINE_MINNESOTA_A02214666_2017 559
## 46 A_SWINE_MINNESOTA_A02214846_2017 559
## 52 A_SWINE_NEBRASKA_A01672345_2017 559
## 65 A_SWINE_OHIO_A02214848_2017 559
## 68 A_SWINE_OHIO_A02219547_2017 559
## 37 A_SWINE_KANSAS_A01378019_2017 556
jmx_overlap <- jmx_data_all_clsI_freq$Var1 %in% jmx_data_all_clsII_freq$Var1
jmx_common <- jmx_data_all_clsI_freq[jmx_overlap,]
jmx_common[,1]
## [1] A_SWINE_ILLINOIS_A01672343_2017
## [2] A_SWINE_ILLINOIS_A02214663_2017
## [3] A_SWINE_ILLINOIS_A02218178_2017
## [4] A_SWINE_ILLINOIS_A02219783_2017
## [5] A_SWINE_IOWA_A01667088_2017
## [6] A_SWINE_IOWA_A01672415_2017
## [7] A_SWINE_IOWA_A02216046_2017
## [8] A_SWINE_IOWA_A02217313_2017
## [9] A_SWINE_IOWA_A02221506_2017
## [10] A_SWINE_KANSAS_A01378019_2017
## [11] A_SWINE_MICHIGAN_A01259076_2017
## [12] A_SWINE_MICHIGAN_A02214235_2017
## [13] A_SWINE_MINNESOTA_A02214666_2017
## [14] A_SWINE_NEBRASKA_A01672345_2017
## [15] A_SWINE_NORTH_CAROLINA_A01672011_2017
## [16] A_SWINE_OHIO_A02219547_2017
## [17] A_SWINE_IOWA_A01104104_2017
## [18] A_SWINE_IOWA_A01672518_2017
## [19] A_SWINE_IOWA_A02215038_2017
## [20] A_SWINE_IOWA_A02215202_2017
## [21] A_SWINE_IOWA_A02221505_2017
## [22] A_SWINE_KANSAS_A01378027_2017
## [23] A_SWINE_MINNESOTA_A02214846_2017
## [24] A_SWINE_NEBRASKA_A02216645_2017
## 72 Levels: A_SWINE_ARKANSAS_A02218161_2017 ...
There are 24 Swine IAV strains overlapped betwee both Class I and Class II JMX data. This means 24 of these strains are having relatively high epitope matches to the DNA vaccine epitopes.
common_both <- epicc_common$Sequence %in% jmx_common$Var1
which(common_both == TRUE)
## [1] 1 3 5 6 7 8
epicc_common[common_both,1]
## # A tibble: 6 x 1
## Sequence
## <chr>
## 1 A_SWINE_KANSAS_A01378027_2017
## 2 A_SWINE_IOWA_A02215038_2017
## 3 A_SWINE_IOWA_A01104104_2017
## 4 A_SWINE_IOWA_A01672518_2017
## 5 A_SWINE_IOWA_A02215202_2017
## 6 A_SWINE_NEBRASKA_A02216645_2017
Conclusion from the analysis: A total of 6 shortlisted H1N1 Swine IAV strains that can be used as challenge strains. Further decision is subjected to collaborator’s point of view as there are few more factors (e.g. the antibodies profile) to be considered before reaching a final decision of picking a challenge strain to use in their vaccine study in pigs.